AITopics

2605.06686

Country: North America > United States (0.49)

Genre: Research Report > New Finding (0.66)

Industry:

Government > Immigration & Customs (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.90)
Government > Regional Government (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Siro, Clemencia, Aliannejadi, Mohammad, de Rijke, Maarten

Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs

arXiv.org Artificial IntelligenceApr-29-2024

In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback. In a conversational setting such signals are usually unavailable due to the nature of the interactions, and, instead, the evaluation often relies on crowdsourced evaluation labels. The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied. We focus on how the evaluation of task-oriented dialogue systems (TDSs), is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated. We explore and compare two methodologies for assessing TDSs: one includes the user's follow-up utterance and one without. We use both crowdworkers and large language models (LLMs) as annotators to assess system responses across four aspects: relevance, usefulness, interestingness, and explanation quality. Our findings indicate that there is a distinct difference in ratings assigned by both annotator groups in the two setups, indicating user feedback does influence system evaluation. Workers are more susceptible to user feedback on usefulness and interestingness compared to LLMs on interestingness and relevance. User feedback leads to a more personalized assessment of usefulness by workers, aligning closely with the user's explicit feedback. Additionally, in cases of ambiguous or complex user requests, user feedback improves agreement among crowdworkers. These findings emphasize the significance of user feedback in refining system evaluations and suggest the potential for automated feedback integration in future research. We publicly release the annotated data to foster research in this area.

follow-up utterance, user feedback, utterance, (15 more...)

doi: 10.1145/3626772.3657712

2404.12994

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > District of Columbia > Washington (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)
(23 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Film (0.68)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Sivaganeshan, Aravinth, de Silva, Nisansa

Fine Tuning Named Entity Extraction Models for the Fantasy Domain

arXiv.org Artificial IntelligenceFeb-16-2024

Named Entity Recognition (NER) is a sequence classification Natural Language Processing task where entities are identified in the text and classified into predefined categories. It acts as a foundation for most information extraction systems. Dungeons and Dragons (D&D) is an open-ended tabletop fantasy game with its own diverse lore. DnD entities are domain-specific and are thus unrecognizable by even the state-of-the-art off-the-shelf NER systems as the NER systems are trained on general data for pre-defined categories such as: person (PERS), location (LOC), organization (ORG), and miscellaneous (MISC). For meaningful extraction of information from fantasy text, the entities need to be classified into domain-specific entity categories as well as the models be fine-tuned on a domain-relevant corpus. This work uses available lore of monsters in the D&D domain to fine-tune Trankit, which is a prolific NER framework that uses a pre-trained model for NER. Upon this training, the system acquires the ability to extract monster names from relevant domain documents under a novel NER tag. This work compares the accuracy of the monster name identification against; the zero-shot Trankit model and two FLAIR models. The fine-tuned Trankit model achieves an 87.86% F1 score surpassing all the other considered models.

association map, lore data, trankit, (13 more...)

doi: 10.1109/MERCon60487.2023.10355501

2402.10662

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada (0.04)
Asia > Sri Lanka (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Games > Computer Games (0.88)
Leisure & Entertainment > Sports (0.53)
Leisure & Entertainment > Gambling (0.53)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Petäinen, Liisa, Väyrynen, Juha P., Ruusuvuori, Pekka, Pölönen, Ilkka, Äyrämö, Sami, Kuopio, Teijo

Domain-specific transfer learning in the automated scoring of tumor-stroma ratio from histopathological images of colorectal cancer

arXiv.org Artificial IntelligenceDec-30-2022

Tumor-stroma ratio (TSR) is a prognostic factor for many types of solid tumors. In this study, we propose a method for automated estimation of TSR from histopathological images of colorectal cancer. The method is based on convolutional neural networks which were trained to classify colorectal cancer tissue in hematoxylin-eosin stained samples into three classes: stroma, tumor and other. The models were trained using a data set that consists of 1343 whole slide images. Three different training setups were applied with a transfer learning approach using domain-specific data i.e. an external colorectal cancer histopathological data set. The three most accurate models were chosen as a classifier, TSR values were predicted and the results were compared to a visual TSR estimation made by a pathologist. The results suggest that classification accuracy does not improve when domain-specific data are used in the pre-training of the convolutional neural network models in the task at hand. Classification accuracy for stroma, tumor and other reached 96.1$\%$ on an independent test set. Among the three classes the best model gained the highest accuracy (99.3$\%$) for class tumor. When TSR was predicted with the best model, the correlation between the predicted values and values estimated by an experienced pathologist was 0.57. Further research is needed to study associations between computationally predicted TSR values and other clinicopathological factors of colorectal cancer and the overall survival of the patients.

artificial intelligence, colorectal cancer, machine learning, (18 more...)

doi: 10.1371/journal.pone.0286270

2212.14652

Country:

Europe > Finland > Central Finland > Jyväskylä (0.05)
Europe > Finland > Southwest Finland > Turku (0.05)
Europe > Finland > Pirkanmaa > Tampere (0.05)
(2 more...)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Moslem, Yasmin, Haque, Rejwanul, Kelleher, John D., Way, Andy

Domain-Specific Text Generation for Machine Translation

arXiv.org Artificial IntelligenceAug-11-2022

Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant context is challenging. In this work, we propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation for MT, simulating the domain characteristics of either (a) a small bilingual dataset, or (b) the monolingual source text to be translated. Combining this idea with back-translation, we can generate huge amounts of synthetic bilingual in-domain data for both use cases. For our investigation, we use the state-of-the-art Transformer architecture. We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts. More specifically, in both scenarios, our proposed methods achieve improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on the Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of human evaluation corroborates the automatic evaluation results.

computational linguistic, machine learning, natural language, (16 more...)

2208.05909

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
(21 more...)

Genre:

Research Report (1.00)
Workflow (0.88)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceDec-9-2021

Semantic Search as Extractive Paraphrase Span Detection

Kanerva, Jenna, Kitti, Hanna, Chang, Li-Hsin, Vahtola, Teemu, Creutz, Mathias, Ginter, Filip

In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i.e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering. On the Turku Paraphrase Corpus of 100,000 manually extracted Finnish paraphrase pairs including their original document context, we find that our paraphrase span detection model outperforms two strong retrieval baselines (lexical similarity and BERT sentence embeddings) by 31.9pp and 22.4pp respectively in terms of exact match, and by 22.3pp and 12.9pp in terms of token-level F-score. This demonstrates a strong advantage of modelling the task in terms of span retrieval, rather than sentence similarity. Additionally, we introduce a method for creating artificial paraphrase data through back-translation, suitable for languages where manually annotated paraphrase resources for training the span detection model are not available.

computational linguistic, prediction, proceedings, (13 more...)

doi: 10.1007/s10579-023-09715-7

2112.04886

Country:

Europe > Finland > Southwest Finland > Turku (0.27)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

arXiv.org Machine LearningNov-29-2020

Persistent Reductions in Regularized Loss Minimization for Variable Selection

Jalali, Amin

In the context of regularized loss minimization with polyhedral gauges, we show that for a broad class of loss functions (possibly non-smooth and non-convex) and under a simple geometric condition on the input data it is possible to efficiently identify a subset of features which are guaranteed to have zero coefficients in all optimal solutions in all problems with loss functions from said class, before any iterative optimization has been performed for the original problem. This procedure is standalone, takes only the data as input, and does not require any calls to the loss function. Therefore, we term this procedure as a persistent reduction for the aforementioned class of regularized loss minimization problems. This reduction can be efficiently implemented via an extreme ray identification subroutine applied to a polyhedral cone formed from the datapoints. We employ an existing output-sensitive algorithm for extreme ray identification which makes our guarantee and algorithm applicable in ultra-high dimensional problems.

conv, extreme ray, setup 2, (14 more...)

2011.14549

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre:

Overview (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Arya, Sakshi, Yang, Yuhong

To update or not to update? Delayed Nonparametric Bandits with Randomized Allocation

arXiv.org Machine LearningMay-26-2020

Contextual bandits provide a natural framework to model a lot of practical sequential decision making problems in various fields. Woodroofe (1979) started studying multiarmed bandit problems with side information in a parametric framework, and Yang and Zhu (2002) initiated an investigation from a nonparametric perspective. See Lai (2001);Bartroff et al. (2008) for reviews on general sequential problems and Bubeck and Cesa-Bianchi (2012) for bandits exclusively. In recent years, bandit problems have gained popularity and have been studied extensively under different names, such as contextual bandits, multi-armed bandits with covariates (MABC), associative bandit problems and multi-armed bandits with side information. For example, when treating patients of a disease, the doctor needs to decide which treatment amongst several competing treatments would be the best for the current patient, given the patient's covariate information and data available from previous patients. Most of the bandit algorithms assume instantaneous observance of rewards, but in most practical situations, rewards are only obtained at some delayed time. For example, it is often the case that several other patients have to be treated before the outcome for the current patient is observed. One way to tackle this problem is to adopt black-box procedures incorporating delayed rewards using the already existing no-delay policies in the stochastic bandits setting.

data mining, machine learning, mean reward function, (18 more...)

2005.13078

Country:

North America > United States > Minnesota (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJan-9-2020

D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multiple High-dimensional Datasets

Shu, Hai, Qu, Zhe, Zhu, Hongtu

Such studies include The Cancer Genome Atlas (TCGA; Hoadley et al., 2018) with multi-platform genomic data for tumor samples, and Human Connectome Project (HCP; Van Essen et al., 2013) with multi-modal brain images of healthy adults, among many others (Crawford et al., 2016; Jensen et al., 2017). The use of multiple data types can allow us to enhance understanding the etiology of many complex diseases, such as cancers (Ciriello et al., 2015; Campbell et al., 2018) and neurodegenerative diseases (Weiner et al., 2013; Saeed et al., 2017). Researchers hence have became highly interested in studying the shared information and individual features across multi-type datasets through separating their common and distinctive variation structures (van der Kloet et al., 2016; Smilde et al., 2017; Li et al., 2018). Let Y k R p k n be the k -th row-mean centered dataset obtained on a common set of n objects for k 1,...,K, where p k is the number of variables for the k -th dataset. One popular approach for disentangling their common and distinctive variation structures is to decompose each data matrix into Y k X k E k C k D k E k for k 1,...,K, (1) where { X k} K k 1 are low-rank signal matrices with { E k} K k 1 being additive noise matrices, { C k} K k 1 are low-rank common-variation matrices that represent the signal data coming from the common mechanism shared across all datasets, and { D k} K k 1are low-rank distinctive-variation matrices each from the distinctive mechanism of each single dataset that is not shared by all.

cov, knull 2, span, (13 more...)

2001.02856

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

arXiv.org Machine LearningJan-21-2019

Sensitivity Analysis of Deep Neural Networks

Shu, Hai, Zhu, Hongtu

Deep neural networks (DNNs) have achieved superior performance in various prediction tasks, but can be very vulnerable to adversarial examples or perturbations. Therefore, it is crucial to measure the sensitivity of DNNs to various forms of perturbations in real applications. We introduce a novel perturbation manifold and its associated influence measure to quantify the effects of various perturbations on DNN classifiers. Such perturbations include various external and internal perturbations to input samples and network parameters. The proposed measure is motivated by information geometry and provides desirable invariance properties. We demonstrate that our influence measure is useful for four model building tasks: detecting potential 'outliers', analyzing the sensitivity of model architectures, comparing network sensitivity between training and test sets, and locating vulnerable areas. Experiments show reasonably good performance of the proposed measure for the popular DNN models ResNet50 and DenseNet121 on CIFAR10 and MNIST datasets.

influence measure, neural network, perturbation, (16 more...)

1901.07152

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > New York (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)